Exploring the Use of Hyper-Threading Technology for Multimedia Applications with Intel® OpenMP* Compiler
نویسندگان
چکیده
Processors with Hyper-Threading technology can improve the performance of applications by permitting a single processor to process data as if it were two processors by executing instructions from different threads in parallel rather than serially. However, the potential performance improvement can be only obtained if an application is multithreaded by parallelization techniques. This paper presents the threaded code generation and optimization techniques in the Intel C++/Fortran compiler. We conduct the performance study of two multimedia applications parallelized with OpenMP pragmas and compiled with the Intel compiler on the Hyper-Threading technology (HT) enabled Intel singleprocessor and multi-processor systems. Our performance results show that the multithreaded code generated by the Intel compiler achieved up to 1.28x speedups on a HT-enabled single-CPU system and up to 2.23x speedup on a HT-enabled dual-CPU system. By measuring IPC (Instructions Per Cycle), UPC (Uops Per Cycle) and cache misses of both serial and multithreaded execution of each multimedia application, we conclude three key observations: (a) the multithreaded code generated by the Intel compiler yields a good performance gain with the parallelization guided by OpenMP pragmas or directives; (b) exploiting threadlevel parallelism (TLP) causes inter-thread interference in caches, and places greater demands on memory system. However, with the Hyper-Threading technology hides the additional latency, so that there is a small impact on the whole program performance; (c) Hyper-Threading technology is effective on exploiting both taskand data-parallelism inherent in multimedia applications.
منابع مشابه
Efficient Multithreading Implementation of H.264 Encoder on Intel Hyper-Threading Architectures
Exploiting thread-level parallelism is a promising way to improve the performance of multimedia applications running on multithreading general-purpose processors. This paper describes our work in developing the first multithreading implementation of the H.264 encoder. We parallelize the encoder using the OpenMP programming model, which allows us to leverage the advanced compiler technology in t...
متن کاملIntel OpenMP C++/Fortran Compiler for Hyper-Threading Technology: Implementation and Performance
In the never-ending quest for higher performance, CPUs become faster and faster. Processor resources, however, are generally underutilized by many applications. Intel’s Hyper-Threading Technology is developed to resolve this issue. This new technology allows a single processor to manage data as if it were two processors by executing data instructions from different threads in parallel rather th...
متن کاملPerformance Study of a Whole Genome Comparison Tool on a Hyper-Threading Multiprocessor
We developed a multithreaded parallel implementation of a sequence alignment algorithm that is able to align whole genomes with reliable output and reasonable cost. This paper presents a performance evaluation of the whole genome comparison tool called ATGC—Another Tool for Genome Comparison, on a Hyper-Threading multiprocessor. We use our application to determine the system scalability for thi...
متن کاملCharacterization of Multithreaded Scientific Workloads on Simultaneous Multithreading Intel Processors
Simultaneous Multithreading (SMT) is a technique that allows multiple independent threads to execute different instructions each cycle. Hyper-Threading (HT) is an implementation of SMT available on recent processors from Intel. Naturally, Multi-threaded applications are very suitable for SMT systems. However, HT due to extensive resource sharing may not suitably benefit OpenMP high performance ...
متن کاملMedia Applications on Hyper-Threading Technology
This paper characterizes selected workloads of multimedia applications on current superscalar architectures, and then it characterizes the same workloads on Intel HyperThreading Technology. The workloads, including video encoding, decoding, and watermark detection, are optimized for the Intel Pentium 4 processor. One of the workloads is even commercially available and it performs best on the Pe...
متن کامل